The idea of this function is to carry out a hold out
experiment of a given learning system on a given data set. The goal of
this experiment is to estimate the value of a set of evaluation
statistics by means of the hold out method. Hold out
estimates are obtained by randomly dividing the given data set in two
separate partitions, one that is used for obtaining the prediction
model and the other for testing it. This learn+test process is
repeated k times. In the end the average of the k scores
obtained on each repetition is the hold out estimate. It is
the user responsibility
to decide which statistics are to be evaluated on each iteration and how they are
calculated. This is done by creating a function that the user knows it
will be called by this hold out routine at each repetition of
the learn+test process. This user-defined function must assume
that it will receive in the first 3 arguments a formula, a training
set and a testing set, respectively. It should also assume that it may
receive any other set of parameters that should be passed towards the
learning algorithm. The result of this user-defined function should be
a named vector with the values of the statistics to be estimated
obtained by the learner when trained with the given training set, and
tested on the given test set. See the Examples section below for an
example of these functions. If the itsInfo parameter is set to the value
TRUE then the hldRun object that is the result
of the function will have an attribute named itsInfo
that will contain extra information from the individual repetitions of
the hold out process. This information can be accessed by the user by
using the function attr(),
e.g. attr(returnedObject,'itsInfo'). For this
information to be collected on this attribute the user needs to code
its user-defined functions in a way that it returns the vector of the
evaluation statistics with an associated attribute named
itInfo (note that it is "itInfo" and not "itsInfo" as
above), which should be a list containing whatever information the
user wants to collect on each repetition. This apparently complex
infra-structure allows you to pass whatever information you which from
each iteration of the experimental process. A typical example is the
case where you want to check the individual predictions of the model
on each test case of each repetition. You could pass this vector of
predictions as a component of the list forming the attribute
itInfo of the statistics returned by your user-defined
function. In the end of the experimental process you will be able to
inspect/use these predictions by inspecting the attribute
itsInfo of the hldRun object returned by the
holdOut() function. See the Examples section for an
illustration of this potentiality.
|